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ASYMPTOTIC EQUIVALENCE OF FUNCTIONAL LINEAR 
REGRESSION AND A WHITE NOISE INVERSE PROBLEM 

By Alexander Meister 

Universitdt Rostock 

We consider the statistical experiment of functional linear regres- 
sion (FLR). Furthermore, we introduce a white noise model where one 
observes an Ito process, which contains the covariance operator of the 
corresponding FLR model in its construction. We prove asymptotic 
equivalence of FLR and this white noise model in LeCam's sense 
under known design distribution. Moreover, we show equivalence of 
FLR and an empirical version of the white noise model for finite sam- 
ple sizes. As an application, we derive sharp minimax constants in 
the FLR model which are still valid in the case of unknown design 
distribution. 

1. Introduction. We consider the statistical problem of functional linear 
regression (FLR). In its standard version, one observes the data (X,Y) 
where X = (X±, . . . ,X n ) T are i.i.d. random variables taking their values in 
C([0, 1]), that is, the set consisting of all continuous functions on the interval 
[0,1], and Y = (Y 1 , . . . ,Y n ) T with 

(1.1) Y J = (X j ,9) + e j , j = l,...,n, 

where (•,•) denotes the L2QO, l])-inner product throughout this work. The 
i.i.d. error variables e~ are assumed to be centered and normally distributed 
with the variance a . Moreover, all X\,e\, . . . ,X n independent. The 

goal is to estimate the regression function 9 £ C L2([0, 1]). In general, 
we allow for such a structure of the function class which does not de- 
termine 9 up to finitely many real- valued parameters. Thus we consider a 
nonparametric estimation problem. Moreover we assume that EX\ = and 
-Pfll^ilb > x] < Cxfi^v{~Cx,\x Cx ' 2 ) for all x > and some finite constants 
Cx,o,Cx,i,Cx,2 > where || • || p , p> 1 denotes the L p ([0, l])-norm of some 
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element of that space. Thus the tails of the design distribution are restricted. 
Such conditions are usual in nonparametric regression problems. 

The FLR model has obtained considerable attention in the statistical 
community during the last years, which is reflected in the large amount of 
literature on this topic. Various of estimation procedures have been proposed 
to make the regression function 9 empirically accessible (see, e.g., [6-8, 12, 
13]). The minimax convergence rates in FLR are investigated, for example, 
in [5, 8, 15]. In [4], adaptive estimation in FLR is considered. Generalizations 
of FLR are discussed in [18]. A central limit theorem for FLR is derived 
in [9]. In [24], practical applications of FLR in the field of medical statistics 
are described; the authors consider two real data sets on primary biliary 
cirrhosis and systolic blood pressure. For a comprehensive introduction to 
the field of functional data analysis in general, see [21]. 

In order to compare two statistical models, it is useful to prove asymptotic 
equivalence between those models. For the basic concept and a detailed de- 
scription of this strong asymptotic property, we refer to [16] and [17]. Also, a 
review on this topic is given in the following section. As an important feature, 
if two models (£i jn and (£2,n are asymptotically equivalent, then (£i jn adopts 
optimal convergence rates and sharp asymptotic constants with respect to 
any bounded loss function from model ^2,n and vice versa. Thus, the the- 
ory of asymptotic equivalence does not only capture special loss functions 
such as the mean integrated squared error (MISE) or the pointwise mean 
squared error (MSE) but includes various types of semi-metrics between the 
estimator and the target function 9 and also addresses the estimation of 
characteristics of 9, such as its support or its mode. Furthermore, supereffi- 
ciency phenomena also coincide in both models when considering subclasses 
G' of the target parameter space G. In particular, research has focussed 
on proofs of asymptotic equivalence of experiments where n i.i.d. data are 
observed, whose distribution depends on some parameter 9 £ G, and exper- 
iments where 9 occurs in the drift of an empirically accessible ltd process. 
For instance, Nussbaum [19] considers an asymptotically equivalent white 
noise model for density estimation, while Brown and Low [2] introduce such 
a model for nonparametric regression. In recent related literature on regres- 
sion problems, Carter [10] studies the case of unknown error variance, and 
Reiss [22] extends asymptotic equivalence to the multivariate setting. 

Returning to model (1.1), we suppose that the nuisance parameters a 
and Px, that is, the distribution of the Xj, are known. That allows us to 
exclude those quantities from the parameter space of the experiment and 
to fully concentrate on the estimation of 9. This condition is also imposed 
in most papers dealing with asymptotic equivalence for nonparametric re- 
gression experiments. The work of [10] represents an exception where the 
corresponding white noise model becomes more difficult and, apparently, less 
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useful to derive adoptable asymptotic properties. With respect to asymp- 
totic equivalence, we restrict our consideration to the case of known Px- 
However, in Section 5, we will show that the sharp minimax asymptotics 
with respect to the MISE are extendable to the case of unknown design 
distribution. 

The main purpose of the current work is to prove asymptotic equivalence 
of model (1.1) and a statistical inverse problem in the white noise setting. 
That latter model is described by the observation of an Ito process Y(t), 
t G [0, 1], Y(0) = 0, driven by the stochastic differential equation 

(1.2) dY{t) = [K0](t) dt + n~ 1/2 a dW(t), 

where W(t) denotes a standard Wiener process on the interval [0,1], and 
K denotes a linear operator mapping from the Hilbert space L2QO, 1]) to 
itself. These models are also widely studied in mathematical statistics (see, 
e.g., [11] and [14]). They have their applications in the field of signal de- 
blurring and econometrics. We will concentrate on a specific version of 
model (1.2) where K is equal to the unique positive symmetric square 
root r 1 ' 2 of the covariance operator T, that is, r 1 / 2 r 1 / 2 = T and Tf = 
J £Xi(-)Xi(t)/(i)eft for any / G L 2 ([0, 1]). Thus, the observation Y(t), t G 
[0, 1], is defined by Y(0) = and 

(1.3) dY(t) = [T l / 2 e\(t)dt + n- 1/2 adW(t). 

In [8], the authors remark on the similarity of models (1.1) and (1.3). In 
the current paper, we will rigorously establish asymptotic equivalence be- 
tween those models. As an interesting feature, additional observation of the 
data X\, . . . ,X n would be redundant in model (1.3). All information about 
the design points is recorded by T in (1.3). Therefore, all what is observed 
in the corresponding white noise experiment is the process Y(t), t G [0,1]. 
After the general introduction to the property of asymptotic equivalence as 
used in the current paper in Section 2, we will first prove (nonasymptotic) 
equivalence of model (1.1) and an empirical version of model (1.3) where 
r is replaced by a noisy counterpart in Section 3. In Section 4, we prove 
asymptotic equivalence of (1.1) and (1.3) under some additional technical 
conditions. In Section 5, we show that the sharp lower bound which follows 
from the results of the previous section can be attained by specific estima- 
tors in the realistic case of unknown design distribution. A discussion of the 
findings and their conclusions are provided in Section 6. 

2. Asymptotic equivalence. To recall the definition of asymptotic equiv- 
alence, we consider two (sequences of) statistical experiments ^j,n — (^j,ni 
3 = ji,j2, with a joint parameter space 0, which may depend 
on n. The LeCam distance between £j ljn and <£j 2 ,n is defined by 

A(g il)n , £ j2>n ) = max inf sup \\K{P jktnfi ) ~ Pj 3M ehv, 
fc=i,2ire%,„ 0e e 
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where || • ||tv is the total variation distance, and &j k .n denotes the collection 
of so-called transitions (see [23] and [19] for their exact definition). The 
statistical experiments <£j 1>n and <£j 2 ,n are called asymptotically equivalent 
if A((5j ljn , £; 2i n) converges to zero as n — > oo, while they are called equivalent 
if A((£j 1>n , Cj 2i n) = for all n. 

In the framwork of our note, we will not use that general definition of 
(asymptotic) equivalence but our proofs lean on following sufficient condi- 
tions for these properties: 

(i) We consider the following sufficient condition for asymptotic equiv- 
alence of <£j lin and <£j 2jn : We define the sets TZj, n ,e, j = ji-,32-, £Q, which 
contains all real-valued integrable random variables R on the domain f2j ]n 
satisfying \R\ < 1 a.s. Thus any kind of bounded loss functions are captured 
by the classes 7tj, n ,o so that the expectation ER with respect to the distri- 
bution Pj, n ,e describes an arbitrary bounded and normalized statistical risk 
for estimating the parameter 6 under the observation scheme £j,n- Now we 
define two sequences (Tj k j 3 _ k)n ) n , k = l,2, of (2lj fc ,2lj 3 _ fc )-measurable map- 
pings from fL, to Vtj 3k . As an essential condition, the Tj k j 3 _ kTl must not 
depend on 9. Hence, Tj, j kn may be interpreted as a transformation of 
the data from an observation contained in the space f2j jn to an observation 
which lies in Vt2,-j,n- Thus a statistician who intends to construct an esti- 
mation procedure for 6 may always apply this transformation Tj kt j 3 _ ktH to 
an observation uj 6 Qj k ,n- Then we obtain asymptotic equivalence of £j 1<n 
and £j 2 ,n when we can show the existence of such transformation sequences 
(Tj k J3_fe,n)n 5 k = l,2, so that 

(2.1) sup sup \ER h _ ktn>e -E(R h _ kinfi oT jkih _ k , n )\ — 

»eefl,,_ fc>Bjfl ew*3_ fc ,n,« 

as n — > oo for all k = 1,2. Accordingly, we have equivalence if the left-hand 
side in (2.1) equals for any n. Intuitively speaking, after transforming the 
data drawn from model &j i n according to Tj lt j 2 ^ n , the distance between any 
bounded statistical risk in model £j 2 ,n on one hand and for the transformed 
data from model £j 1; n becomes small for large n or is equal to zero for any n, 
respectively. The same condition must also hold true when exchanging ji 
and j 2 - 

In the specific framework of our note, we assume, in addition, that all 
transformations Tj k j 3 _ ktn must not depend on the nuisance parameter a. 
That compensates the unrealistic condition of known a. In particular, a 
is not used to transform the data or to construct decision procedures or 
estimators. Therefore, our results also addresses the case of unknown a. 
Nevertheless, a must be viewed as uninteresting for the statistician, that is, 
it must not explicitly occur in the loss functions Rj k ,j k+1 - Thus, the problem 
of estimating a is not covered by our approach. 
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(ii) Assume that the experiment (£j 2 ,n describes the observation of T(lj) 
for uj G Qj in $ in experiment £j 1<n where T is a sufficient statistic for 9 in 
experiment (2^ n . Then <£ J1)ri and <tj 2 n are statistically equivalent (i.e., their 
LeCam distance vanishes) and, hence, asymptotically equivalent. That as- 
sertion holds true whenever the experiments are Polish spaces. This criterion 
is satisfied as all probability spaces considered in the current work are 
C([0, 1]), L2QO, 1]) and some set products of those classes (see Lemma 3.2 
in [2]). 

(iii) If some experiments &j un and <£j 2>n on one hand, and &j 2 , n and (£73 , n 
on the other hand, are (asymptotically) equivalent, then £j 1<n an d £j 3 ,n are 
(asymptotically) equivalent, too. Also, (asymptotic) equivalence of (S^ >n and 
< £j 2 ,n is a symmetric relation between the experiments. 

(iv) Assume that some experiments <£j lin and (£j 2 , n may be decomposed 
into two independent experiments &j ljn ,k and £j 2t n,k, k = l,2, respectively. 
Moreover, we suppose that the experiments &j 1>n ,i and <£j 2 ,n,i on one hand 
and the experiments <£j 1 n % and £j J]fl 2 on the other hand are (asymp- 
totically) equivalent. Then, the combined experiments <Bj llU and £j 2 , n are 
(asymptotically) equivalent as well. 

Now, <&i n denotes the underlying experiment of the FLR model (1.1); 
it is defined by Oi, n = C([0, l])( n) x , 2li jn denotes the Borel cr-algebra 
when considering the uniform metric on the functional components and the 
Euclidean metric on the real- valued components of Cli <n . The correspond- 
ing probability measures Pi tU: e are well defined by the assumptions of the 
model (1.1). The parameter space C L2QO, 1]) will be specified later. Still, 
the observations (X,Y) may be viewed as random variables having their 
domain on some basic probability space (f2,2l, P). 

3. Empirical covariance operator. We define the linear covariance oper- 
ator r:L 2 ([0,l])^L 2 ([0,l]) by 

T/ = J EXxtfX^fWdt V/6L 3 ([0,1]). 

Writing K(s,t) = EXi(s)Xi(t), we realize that T is a Hilbert-Schmidt in- 
tegral operator where 

/ f \K{s,t)\ 2 dsdt< {EWXxWlf <oo, 
Jo Jo 

by the Cauchy-Schwarz inequality and the tail condition imposed on the 
distribution of ||-Xi||2- Hence T is a continuous and compact operator. We 
have K(s, t) = K(t, s) for all s,t £ [0, 1] so that the operator T is self-adjoint. 
Furthermore, it is positive; that is, by Fubini's theorem we have 

(f,Tf) = E\(X 1 ,f)\ 2 >0 

for any /€L 2 ([0,1]). 
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Then, some well-known results from functional analysis, in particular 
spectral theory for compact operators, may be applied. There exists an 
orthonormal basis {(fj}j>i of the separable Hilbert space L%([0, 1]) which 
consists of eigenfunctions of T. The corresponding eigenvalues are denoted 
by Aj > 0. The sequence (A n ) n converges to zero and may be viewed as 
monotonously decreasing without loss of generality. Those results are also 
used, for example, in [5]. Furthermore, for T as for any compact self-adjoint 
positive operator from L2QO, 1]) to itself, there exists a unique compact self- 
adjoint positive operator T 1 / 2 from -L2GO, 1]) to itself such that (r 1 / 2 ) 2 = T; 

then r 1 / 2 is called the square root of T. We have T^-^ipj = \ l J 2 Lpj for any 

i>i- 

We may define an empirically accessible version T of T by replacing the 
expectation by the average; more precisely, we have 




Thus, f may be viewed as the operator T when Px equals the uniform dis- 
tribution on the discrete set {X\, . . . ,X n }. Therefore, all properties derived 
for r in the previous paragraph can be taken over to T. In particular, (pj, 
integer j > 1, denotes the orthonormal basis of the eigenfunctions of T with 
the eigenvalues Xj. 

Now we consider the conditional probability density pi,n,e(yi> • • • ,Vn I 
X±, . . . , X n ) of the data Y±, . . . , Y n given the design functional observations 
Xi, . . . ,X n in model (1.1). This density shall be understood with respect to 
the re-dimensional Lebesgue measure. We derive that 

Pl,n,6 

(yt,...,y n I Xi,...,X n ) 

(3.1) = (2n)- n / 2 a- n f[e W (-^(y 3 - (X^O)) 2 ) 

j=i 

= (2^)-"/ 2 C T- n exp(-||y-x|| 2 /(2a 2 )), 

with the vectors y = (y\, . . . , y n ) T and x = ((Xi,9), . . . , (X n , 6)) T ■ Moreover, 
|| • || denotes the Euclidean norm. Expanding 6 £ O C Z^QO, 1]) in the or- 
thonormal basis {<fij}j>i gives us that 

00 

(3.2) (X J ,9) = Y,(X J ,0k)(0k,O). 

k=l 

We impose the following condition on the distribution Px- 

P[X\ G L] = 0, for any deterministic linear subspace 

(3.3) 

L C L 2 ([0, 1]) with dimL < 00. 
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Intuitively, this assumption provides that the probability mass of the Xj 
fills the whole of L2([0, 1]). Somehow, (3.3) is the functional data analog 
for continuity of a distribution of some real- valued random variables. It is 
satisfied when we take an appropriate Gaussian process for X\, for instance. 
Condition (3.3) yields that the linear space generated by X\, . . . ,X n is n- 
dimensional almost surely. Otherwise, at least one of the Xj must be included 
in the linear hull of the other design variables. According to (3.3) that occurs 
with probability zero when employing the conditional probability measure 
given the data X±, . . . , Xj-i, Xj + \ , . . . , X n . Finally, applying the expectation, 
we obtain the desired result for the unconditional distribution. 

We realize that the range of T is included in the linear hull of X\ , X n . 
By definition, 0j is contained in that n-dimensional space whenever Xj > 0. 
As the <pj form an orthonormal basis at most n of the eigenvalues A, are 
nonvanishing. Furthermore, the linear independence of the X±, . . . ,X n im- 
plies that the functions TX k = n Y^=i(Xj,X k )Xj, k = l,...,n, are lin- 
early independent, too, so that the range of T is equal to the linear hull of 
X\ , . . . , X n . Clearly, the range of T also coincides with the linear hull of all 
ifj with Xj > 0, from what follows (Xj,ip k ) = for all j = 1, . . . ,n and k> n. 
Also, we have Xj > for j = 1, . . . , n and Xj = for j > n. Hence, (3.2) leads 
to the representation 

n 

(3.4) (Xj,0) = J2(X j ,<p k )(<p k ,6) 

k=l 

for all j = 1, . . . , n. Equation (3.4) is equivalent to the system of linear equa- 
tions x = Qf with the vector f = ((ipi,6), (ip n ,9)) T and the matrix Q 
with the components Qj ; k = (Xj,<Pk), j,k = 1, . . . , n. Then the conditional 
density p± n g as in (3.1) may be written as 

(3.5) pi, n ,e{yu ...,y n \X 1 ,...,X n ) = (2^)~ n l 2 a~ n exp(-||y - Qf || 2 /(2 f r 2 )). 
We consider that the (k, k')th component of the matrix Q r Q is equal to 

n 

y^{ x j,0k)(Xj,0k') = n{t<p k ,ip k i) =nX k -5 k ^. 
i=i 

Thus Q T Q is a diagonal matrix containing nX k as its (k, k)th component. 

We denote the diagonal matrix having n l l 2 X^ 2 as its (k,k)th component 
by D. Obviously, D is invertible, and we define A = QD _1 . We have 

A T A = D _1 Q T QD _1 = I, 

where I denotes the identity matrix. Also, this yields that AA T = I and 
that A is an orthogonal matrix. Thus, || Av|| = ||v|| for any vector v £ M n . 
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Equality (3.5) provides that 

(yi,...,y n \X 1 ,...,X n ) 
(3.6) = (27r)""/ 2 C j- ri exp(-||AA T y - ADf || 2 /(2cr 2 )) 

= (2vr)- ri / 2 C j-™exp(-||A T y - Df \\ 2 /{2a 2 )). 

Referring to the notation of (2.1), we consider the expectation ERi n g(X., 
Y) where Ri >n ,e G T^i,n,e- We derive that 

ERi, n ,e(X-, Y) = EE(R ljn , e (X,Y) | X) 

= E J J Ri,n,e(X 1 ,...,X n ;y 1 ,...,y n ) 

x Pl,n,e(Vu ■ ■ ■ >2/n | X 1 ,. . . ,X n )dy 1 - ■ -dy n 



(3.7) 



E f f Ri,n,9(X 1 ,...,X n ;Az)(2Tr)- 



•n/2 



x o-" n exp(-||z - Df || 2 /(2cr 2 )) ctei • • • dz n 
= ERi,n,e(X-, AZ), 

where Z = (Zi, . . . , Z n ) T denotes a vector consisting of independent normally 
distributed random variables where has the mean 

and the variance a 2 , conditionally on the cr-algebra generated by X. There- 
fore, the Z k may be represented as 

(3.8) Z k = (<p k ,n 1 / 2 t 1 / 2 0)+ae k , k = l,...,n, 

where Ei,...,e n are i.i.d. N(0, 1) -distributed random variables. The £j are 
independent of the cr-algebra generated by X. We have applied the integral 
transformation y = Az where det A = ±1 due to the orthogonality of A. 
Note that the sign of the eigenfunctions ipj may still be chosen; we can 
arrange that det A = 1 . 

Now we define the statistical experiment (£2,71 with the same parame- 
ter space as <£i, n , (^2,m2l2,n) = (^i,n,2li,n) and P2, n ,e as the probability 
measure generated by the random variable (X, Z) with Z as in (3.7). In the 
notation of Section 2, paragraph (i), we use the mapping T2,i, n '■ ^2,n —> ^i,n 
defined by T 2 ^ n (x,z) = (x, Az), x G C o ([0, l]) (n) , z G R n , as the data trans- 
formation from <£.2,n to <£i n . By definition, the matrix A does not depend 
on the parameter but only on the data X\ , . . . , X n and the known or- 
thonormal basis {<Pj}j>i- Also, it does not depend on a as requested in the 
previous section. We have already derived that A is an orthogonal matrix 
so that T2 1 n is a bijective mapping from the set C o ([0, !])(") x K n to itself. 
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Hence, its reverse mapping T 21n may be used as the data transformation 
T\ 2,n- Then, according to (2.1), we have proved the following lemma. 

Lemma 3.1. Under condition (3.3), the statistical experiments (£i in and 
<^2,n o- re equivalent. 

The random variables £j, integer j, as occurring in (3.8), may be repre- 
sented by 

£j = [ 0j(t)dW(t), 
Jo 

where W denotes a standard Wiener process on [0, 1] which is independent 
of X. We deduce that the £i,£2j • • ■ are an independent sequence of iV(0, 1)- 
distributed random variables. Moreover, they are independent of X±, . . . , X n 
although <fj depends on these design variables. That can be shown via the 
conditional characteristic function of (ei,£2, • • •) given X±, . . . ,X n ; that is, 




for all real-valued sequences (s m ) m >i with Ylm=i s m < 00 • Applying the 
expectation to the above equality, the unconditional characteristic function 
of (ei,£2,.. •) turns out to coincide with the conditional one. We have 

(3.9) Z j = (v j ,n 1/2 f 1/2 9) + a j (t)dW(t) = J 0jdZ(t) 

for all j = 1, . . . ,n where Z(t), t S [0, 1], denotes an ltd process satisfying 

(3.10) dZ{t) = n 1/2 [f 1/2 6](t) dt + a dW(t), 

and Z(0) = 0. The differential dZ(t) shall be understood in the Ito sense. 

Now we define the statistical experiment ^-2,,n with a completely functional 
observation structure. We fix that 0,%^ = C([0, l])( n+1 ) with the correspond- 
ing Borel cr-algebra 2l3, n - The probability measure Pz )n ,B is defined via the 
observation of X as in <£.2,n an d the Ito process Z(t), t G [0,1], as defined 
in (3.10). The definition (3.9) of Zj can be extended to j > n straightfor- 
wardly. As Xj = 0, we obtain that 

Zj=a [ <pj(t) dW(t) Vj > n. 
Jo 
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Moreover, Z(t) is uniquely determined by the Zj for all integers j > 1 and 
vice versa. That can be seen as follows: 



for all t £ [0,1] where the infinite sum must be understood as an E\\ ■ He- 
lmut . That seems to cause some troubles as we only observe one element of 
the probability space. However, convergence in probability implies almost 
sure convergence of a subsequence so that Z(t) is fully accessible by the 
observation of all Zj. On the other hand, by a similar argument, all Zj are 
accessible (in practice, that means approximable arbitrarily precisely) by a 
trajectory of the process Z. 

Hence the data set {Zj :j>n} is independent of the Z±,..., Z n , condition- 
ally on the <T-algebra generated by X. Furthermore, the distribution of the 
Zj, j > n, does not depend on the target parameter 6 so that Zj, for j > n, 
does not contain any information about 9. We conclude that (X, Z\, . . . , Z n ) 
is a sufficient statistic for the observation scheme in the experiment (£3 jn . 
We can utilize result (ii) from Section 2 in order to prove equivalence of the 
experiments &i,n and £3,™- Considering paragraph (iii) from Section 2, we 
may establish equivalence of the experiments (£ 1)n and £3 >n . This result is 
presented in the following theorem. 

Theorem 3.1. Under condition (3.3), the FLR statistical experiment 
<£l,n is equivalent to the model (£3^ where one observes X and the ltd process 
Z(t), t E [0, 1], as defined in (3.10). 

4. Asymptotic approximation. In the previous section, we have derived 
a statistically equivalent white noise model for the FLR problem. However, 
the Ito process Z in (3.9) contains the noisy operator V in its construction. 
In the current section, we will replace it by the covariance operator Y. 

For that purpose, we split the original experiment €.\ >n into two indepen- 
dent parts (£i,n,i and <£i, n ,2 where <£i, n ,i is based on the observation of the 
data (Xj,Yj), j = 1, . . . ,m, and £i, n ,2 consists of the residual data (X,-, Yj), 
j = m + 1, . . . ,n. The selection of the integer parameter m is deferred. The 
strategy of splitting the sample in the current context leans on [19]. Ap- 
plying Theorem 3.1 to each of the experiments £i in) fc, k = 1,2, we obtain 
equivalence of <£i jn .fc and the experiments C^n.fc f° r k = 1,2. Therein, (£4^1 
is defined by the observation of Xi = {X\, . . . , X m ) and the Ito process Z\{t), 
t e [0, 1], specified by Zi(0) = and 




3=1 



dZ\ (t) = m 



V 2 \rl /2 0](t)dt + adWi(t), 
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and accordingly (£4^2 is denned by the observation of X2 = (X m+ i, . . . , X n ) 
and the Ito process Z 2 (t), t G [0, 1], specified by Z 2 (0) = and 

dZ 2 (t) = (n-m) 1/2 [f l /2 6] (t)dt + a dW 2 (t) . 

Furthermore, f^, k = 1,2, denotes the empirical covariance operator con- 
structed by the data Xi and X2, respectively. Also note that W\ and W 2 
are two independent standard Wiener processes. Using criterion (iv) in Sec- 
tion 2, the experiment ^4 jn , which combines the independent experiments 
£4,71,1 and G;4,n,2) we deduce that (£4^ and are equivalent. 

From the experiment £4 n,l we construct an estimator #1 for 8. We define 
that 

K 

1= ^ m -V2 A -l / [ffV](*)^l(%fc, 
fc=l 

where iT is an integer-valued smoothing parameter still to be selected. Con- 
dition (3.3) guarantees that all Xj are positive since, otherwise, Xj = would 
yield that E\ (X\, tpk)\ 2 = for all k> j, and hence Ylk>j I (^^1 ? V^fc ) 1 2 = a.s. 
so that X\ would lie in the linear hull of ipi, . . . , cpj—i. Thus the estimator 
9\ is well defined. 

We introduce the data transformation T^^ n : ^l^ n ^5,n where 

74,5,n(xi,Zi,X2,Z 2 ) 

= (*i,z 1 ,x 2 ,z 2 -(n-m) 1 / 2 J [ty 2 %]{t)dt 

+ {n-m) 1 l 2 J [V^O^dVj. 

The transformation is fully accessible by the data drawn from the exper- 
iment £4,71,1 and the assumed knowledge of the distribution of X. We set 
0V,2M = OV,2l4,rO where 4 ,n = C m ([0, 1]) x C ([0, 1]) x C n " m ([0, 1]) x 
Co([0, 1]) and 2l4 jn is the corresponding Borel cr-algebra. The data structure 
of (£4^ is represented by (Xi, Z\, X2, Z 2 ) when inserting the data set as an 
argument of the mapping T^§ n . Note that z 2 — Z 2 may be inserted in the 
definition of the estimator 8\ . The integral occurring in the definition of 8\ 
is not defined for all continuous functions Z\ but for almost all trajectories 
of Z\. For the other negligible trajectories the integral may conventionally 
be put equal to zero to make the mapping T^ ^ n well defined on the whole 
of its domain. 

We define the experiment (£5^ where ones observes the data 
(Xi, Zi,X2, Z 2 ) = T4 5 5 jn (Xi, Z±,~K 2 , Z 2 ), 
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where the data Xi , Zi , X2 , are obtained under experiment <B^ n - The ex- 
periment (£5^ is defined on the probability space (^5,ri, 2ts,n) ■ Considering 
the definition of T^ n , we realize that the shift contained in the forth com- 
ponent is still available in the experiment G;5 >n as the other components are 
kept. Therefore, T^ tn is an invertible transformation so that the experi- 
ments ^4 jn and (£5^ are equivalent. 

In the experiment (£5^, the component Z' 2 is still an ltd process condition- 
ally on Xx, X2, Z\. Now we introduce the experiment <Bq jU with (r^.m 2l6,n) = 
(fi5,n , 2l5,n) where one observes the data Xx, Z\, X2 and the ltd process S2(t), 
t€ '[0,1] with 5 2 (0) = and 

dS 2 (t) = {n - m) 1 ' 2 [T^ 2 e} (t) dt + a dW 2 (t) . 

In the notation of Section 2, we consider that 

\ERj,n,e(X-i, Z\,X.2, Z2) — ERj 



< E 



(4.1) 



1 



exp 



j,n,y(Xi,Zi,X2,S , 2 

1 , 



< 2E< 1 



exp 



A 5i6 (t)dW 2 (t) 
1/2 



2a 2 



^5,6 Hi 



2d 2 



I -^5,6 || 2 



< 2{ 1 



exp 



V 2a 2 



-EIIA 



5,6 M 2 



1/2 



where 



A 



5,6 



in 



m 



4/ 2 



(r 1/2 e - r 1/2 ^i + f 2 /2 #i - f 



(n-m) 1 / 2 ^ 1 / 2 



a/2 



Therein, we have used Girsanov's theorem, ||-Rj ini 6>||oo < 1 for Rj in ,e £ ^-j,n,e 
as j = 5,6, the Bretagnolle-Huber inequality and Jensen's inequality in the 
last step. 

Now we study the expectation occurring in (4.1) by Parseval's identity 
with respect to the basis {(pk,2}k>i arid the orthogonal expansion of if>k,2 
with respect to {(fj}j>i where {0k,2}k>i denotes the eigenfunctions of 1% 
and Afc 2 the corresponding eigenvalues. 



E\\A 



5,6 M 2 



(n 



oc 

m)^E 

k=l 



3=1 



1/2 
2 



(n 



00 

m)^E 

k=l 



3=1 



1/2 cl/2 



A fc.2)(^3'^,2)(^3' 
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< (n-m)^£^j 7 |Aj - A fei 2||(^-,^fc,2)| 2 

k=l j=l 

oo 

f=i 

where we have used the Cauchy-Schwarz inequality for sums and the ele- 
mentary inequality (y/x — y/y) 2 < \x — y\ for all x,y > 0. Therein 7 > is 
still to be selected. Also, the independence of 12 and 9\ has been utilized. 
Then, we apply the Cauchy-Schwarz inequality with respect to the discrete 
random variable V satisfying P[V = \ Xk,2 — Aj |] = | (tfj , ^,2} I 2 for all integers 
k > 1 and some fixed integer j, conditionally on X2. We conclude that 



S||A 5j6 ||! < (n- m) 



x^Er 7 ElAy-A^I 2 !^-,^)!' 



1/2 



jt=i 



<(n-m)i E/^Kvy.fl-fli) 



We consider that 



E\\r 2 <Pj-T<p j \\l = E 



(4.2) 



x^rn^ll^^-r^nl} 1 / 2 . 



^ n—m 

— ^2(X k (X k , (pj ) - £X fc (X fe , ifj )) 



n — m 



k=l 



< (n-my'EWX^lKXucp^ 2 

< (n - m)- 1 ^^-,^) + (n - m)^ 1 £;||X 1 ^l (Cj , oo) (||X 1 || 2 ) 
^(n-TrO-^Aj + tn-m) -1 (A; + l) 4 P[||Xi|| 2 > fc] 



< const. • (n — m) *(c 2 Aj + Cx,o exp(-Cx,ic^ x ' 2 /2)) 

for n sufficiently large where the sequence (cj)j f 00 remains to be deter- 
mined. In order to obtain those results, we impose the following: 

Condition X. We assume that condition (3.3) holds true; C\,23~ a > 
Aj > CA,ii~ Q for all integer j > 1 and some a > 2, Ca,2 > Ca,i > 0; EX\ = 0; 
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-Pfll^ilb > x] < Cx,o exp(— Cx,\x Cx ' 2 ) for all x > and some finite constants 
Cx,o, Cx,i, Cx,2 > 0. 

Condition X imposes a polynomial lower bound on the sequence of the 
eigenvalues of T. This assumption is very common in FLR (see, e.g., [5]). 
When Condition X is fixed the underlying inverse problem can be viewed 
as a moderately ill-posed problem unlike severely ill-posed problems where 
exponential decay of the eigenvalues occurs. Condition X also corresponds to 
the deconvolution setting with ordinary smooth error densities in the related 
field of density estimation based on contaminated data. 

As an example for a stochastic process which satisfies Condition X, we 
mention the random variables 



3=1 

where the tpj, integer j, form an arbitrary orthonormal basis of L2QO, 1]); 
the Gj are i.i.d. real-valued centered random variables with a continuous 
distribution which is concentrated on some compact interval, and EG\ = 1. 
We stipulate that a > 2. Easy calculations yield that the coefficients j~ a 
and tpj are the eigenvalues and the eigenvectors of the corresponding covari- 
ance operator V, respectively. Stipulating that the sequence { ||^||oo }j>i 1S 
bounded above (as satisfied, e.g., by the Fourier polynomials), we can show 
that Condition X is fulfilled. In particular, the random variable (g, X) is 
continuously distributed for any g G L2QO, 1]) \ {0} since (g,<Pj) 7^ for at 
least one integer j so that the distribution of (g,X) is just the convolution 
of an absolutely continuous distribution and some other distribution; hence 
the distribution of {g, <pj) has a Lebesgue density so that condition (3.3) can 
be verified. All other assumptions contained in Condition X can easily be 
checked. Another even more important example for design distributions are 
the Gaussian processes X(t) = j Q a(s) dW(s), t G [0,1], where W denotes a 
standard Wiener process and a is a sufficiently smooth function which is 
bounded from above and below by positive constants. These processes sat- 
isfy Condition X as well where a = 2. The decay condition can be verified 
via the famous reflection principle of Wiener processes. 

Returning to the investigation of an upper bound on E^As^H!, ^ ne 
lowing inequality is evident: 

i?||A 5i6 ||2<(n-m) 1 / 2 |^j'^|(^,e-0 1 )| 2 | 

OO 

x E-T 7 ( c i^' + C x , eM-Cx,ic^ 2 )) 1/2 . 
3=1 
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We deduce that 
E\( Vj ,,9- <?i)| : 



E 



{ !Ps ,,e)-l {j ,< K} m- x l 2 X:} [ [T 1 1 /2 i Pj ,](t)dZ 1 (t) 



1 {j'>K}\(<ff,0)\' 
+ \j'<K} 



x E 



fi /2 ^](t)^(t) 



(4.4) 



(0, yjj/) + am 

1 {f>K}\(v } j',0)\ 2 

+ i{f<K}rf{E\(fie,ip f ) - (Te^j^ + ^m-'EWr 1 / 2 ^} 

< l{j'>K}\(<Pj',0)\ 2 

+ l {i '<^}A J - 2 {m- 1 ||0||^||X 1 ||2|(X 1 , V 9 i /)| 2 + a 2 m- 1 A i /} 

<l {f>K }\(^',0)\ 2 
+ l{i'<x} 

x X/im^WeWKc^Xj, + C7x,oexp(-Cx,ic^' 2 /2)) + o 2 m~ l X r }. 

For further investigation of the asymptotic quality of the estimator 9i, 
some conditions on Px and O are required. They are stated such that — 
combined with Condition X — all previously imposed assumptions concerning 
those characteristics are included. 

Condition T. We assume that 

oo 

^(1 + ^)|( W ^)| 2 <C 

k=l 

for all 9 E Q and some constants (3 > (a + l)/2 and C@ < oo, which are 
uniform with respect to 9 G 0. 

Condition T says that the 9 £ O are uniformly well approximable with 
respect to the orthonormal basis consisting of the eigenfunctions of T. The 
parameter (3 describes the degree of this approximability. If the ip^ were 
some Fourier polynomials, then Condition T could be interpreted as Sobolev 
constraints on the set of the target functions. 

We apply the parameter selection K x m l ^ 2 ^ +a+l \ and we fix that Cj = 
d()\og dl j with some constants do,d\ sufficiently large and that 7 E (0, (3 — 
a/2 — 1/2). Also, we choose m= \n/2\. Inserting that result into (4.3), we 
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deduce by Conditions X and T that 

SUp£||A 5! 6||! = 0( n («+l-2/3+27)/(4/3+2a+2) ^3 „) = o(1) 
6»G0 

for some d^> 0, due to the inequality /3 > (a + l)/2 and the suitable selec- 
tion of 7. Revisiting inequality (4.1), we have finally proved by Section 2, 
paragraph (i) that the experiments £5^ and (£6,n are asymptotically equiv- 
alent. 

In the experiment <£-6,n, the observation of 5 2 allows us to construct an 
estimator 62 for as well. It is given by 

K 

02 = - m)- 1/2 \k l / [T l/ \ k ](t) dS 2 (t)p k , 

k=i •* 

where the parameter K can be adopted from the estimator 9\. We specify 
the transformation Tqj )U : f^n — > ^7,n with 

76,7,n(xi,Zi,X 2 ,S 2 ) 

= (x^Zi-m 1 / 2 ^ [t l fe 2 }(t)dt + m 1 / 2 [T 1 / 2 § 2 }(t)dt, X2 ,s 2 \ 

Again the shift of the second component is accessible by the other compo- 
nents which are maintained under the mapping so that Tqj iU is invertible. 
Therefore, we define the experiment &7 n by the observation of Tg 7 n (Xi, Z%, 
X 2 ,S' 2 ) with (Xx, Z\, X 2 , S2) as under the experiment <t^ n . Hence, we put 
(f27 jn ,2l7 jn ) = (fi6,n 5 2l6,n) and obtain that (£7^ is equivalent to <£6,n- 

We define the experiment (£s, n by the observation of (Xi, Si, X 2 , 5 2 ) on 
the probability space (fig^^n) = (^7,nj2l7,n) where Si denotes the ltd 
process with Si(0) = and 

dSi (t) = m 1 ' 2 [r^ 2 6} (t) dt + a dWi (t) . 

We can show that (£g jn is asymptotically equivalent to (£7^ analogously to 
the proof of the asymptotic equivalence of (£5^ and <Bq jU . The only remark- 
able modification concerns the application of the estimator 9 2 instead of 0\. 
However, even for that term we establish an upper bound at the same rate 
as for estimator 9\ since the asymptotic order of m and n — m coincide. 

Taking a closer look at the data drawn from (Ss.nj we realize that the 
random variables Xi,5i,X 2 ,5 2 are independent. That occurs as we have 
replaced the empirical covariance operators by the true deterministic one. 
Furthermore the data sets Xi, X 2 do not carry any information on 9 so that 
S'i,S 2 represent a sufficient statistic for the whole empirical information 
obtained under <£s,n- By Section 2, paragraph (ii), we conclude that <Bs, n is 
equivalent to the experiment (£g |n in which only the observations 5"i,5 2 are 
available. Thus we put f2g jn = Cb([0, 1]) x Cb([0, 1]) and 2lg jn equal to the 
corresponding Borel er-algebra. 
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We define the transformation Ig^n : fig,7i — > ^io,n with (^io,n, 2lio,n) = 
(J7 9!n ,2l 9>n ) by 

T9,10,n(si, S2) = A(si, S 2 ) T 

with the matrix 

m 1 / 2 jn (n — m) 1 ' 2 /n 
m -i/2 _( n _ m )-i/2 

We easily verify that A is invertible so that the experiment (£10, n which is 
defined by the observation of (Ti,T 2 ) = Tg t io tn (Si, <S 2 ) is equivalent to the 
experiment (£9^. We consider the characteristic function of the L 2 ([0, 1]) x 
L 2 ([0, 1]) -valued random variable (Ti,T2), 

£?exp(i(ti,ri)+i(t2,T 2 )) 

= J Bexp( i (efA T (i 1 ,t 2 ) T ,5i)) J Eexp(i(e 2 ^A T (t 1 ,t 2 ) T ,5 2 )) 



■■expfi/tuj [T^e^dt) 



xexpl — — a 2 I I ti(xi)mm{xi,X2}ti(x2) dxidx2 



x exp( -- 



2n 
1 



1 1 
— + 



m n — m 



a 2 J J t 2 (xi) min{xi,2; 2 }t 2 (x 2 ) dx\ dx 2 



for any ii,t 2 G L 2 ([0, 1]) so that Ti and T 2 are two ltd processes satisfying 
Ti(0) = T 2 (0) = and 

rfTi(t) = [r^K^dt + rr^VdW^t), 

/l 1 \ 1/2 

dT 2 (t) = a [— + dW A {t), 

\m n — m J 

where W3 and W4 are two independent Wiener processes. Thus T\ and 
T 2 are independent, and T 2 is totally uninformative with respect to the 
target function 6. Applying Section 2, paragraph (ii) again, we have estab- 
lished equivalence of £10, n and the experiment (£n in , which is equipped with 
^n,n = Cb([0, 1]) and the corresponding Borel cr-algebra 2lii, n , and charac- 
terized by the observation of the process T\ , which coincides with the process 
Y as defined in (1.3). 

Summarizing we have shown asymptotic equivalence of the experiments 
<£i n and <£ii jn . That provides our final main result, which will be given as 
a theorem below. 

Theorem 4.1. Under the Conditions X and T, the FLR experiment 
<£l,n with known design distribution and independent N(Q, a 2 ) -distributed 
regression errors is asymptotically equivalent to the white noise experiment 
&ii,n where only the ltd process Y as in (1.3) is observed. 



18 



A. MEISTER 



5. Sharp estimation for unknown Px- We can combine our results with 
Theorem 1 in [11], which is due to [20], in order to derive a sharp minimax re- 
sult with respect to the MISE for the FLR problem under known covariance 
operator. It follows from there that this sharp minimax risk corresponds to 
the sequence 

oo 

a n = ( x 2 n- 1 X;A,T 1 (l-7(l + ^) 1/2 ) + , 

k=l 

where 7 is the unique solution of the equation 

2 00 

°- £ \~\\ + *^) 1/2 (1 - 7(1 + ^) 1/2 ) + = Cei, 

k=l 

under the conditions of Theorem 4.1. More concretely, there exists an esti- 
mator 9 of 9 in the FLR model, which satisfies 

supE\\9-6\\l = a n {l + o(l)). 

Thus, any other estimator in the underlying model satisfies the above equa- 
tion when = is replaced by >. We have established sharp asymptotic con- 
stants. 

Critically, we mention that the loss function a" 1 ^ — 6W2 is apparently 
not bounded. Still, asymptotic equivalence yields coincidence of sharp min- 
imaxity for the loss function mmjLJ^a" 1 !^ — 9\\\} for some (D n ) n — > 00 
sufficiently slowly. We can show that, in the white noise inverse problem, 
the sharp constant result is extendable to the truncated loss function. Using 
Theorem 4.1, we have a sharp lower bound for the FLR model even for the 
truncated loss function. 

However, the design distribution Px is assumed to be known and occurs in 
the definition of the minimax estimator. On the other hand, the lower bound 
as derived from Theorem 4.1 in the previous section provides a lower bound 
for the FLR model in the case of unknown Px as well since nonknowledge 
of Px cannot improve this lower bound. Thus if we succeed in showing that 
some estimator achieves these asymptotic properties under the assumption 
of unknown Px , then sharp asymptotic minimaxity is extended to this more 
realistic condition. Assuming that all conditions of Theorem 4.1 except the 
knowledge of Px hold true, we propose the estimator 

1 n 

(5.1) 9 =E w i-Ewwi 

j 1=1 

for 9 where \j tP = max{Aj, n~ p } for some p £ (0, 1/2). The weights Wj remain 
to be specified. Using the techniques of the papers of [5] and [15], the MISE 
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of 9 is equal to 
(5.2) E\\0-6 



w k - 1 



A 



fc,p 



)(0k,0)\ 2 +-J2 Ew 

n. ^— • 



; Afc 
A fc,p 



We stipulate that for k > n p / a /logn all weights w k shall be put equal to 
zero. For all other k we have X k > 2n~ p for n sufficiently large so that 

EXk/Xl «> ~ (1 - 1/(2 1 log *)); 



+ n 2 "A fc P[A fc - A fc < -A fc /(2 + log*)] 



< K 1 + K 1 



i 



+ (2 + logA;) 2 0(n 2 ' , - 1 ; 



log k + 1 

where we have used Markov's inequality and that £/|Afc — Afc| 2 is bounded 
from above by the expected squared Hilbert-Schmidt norm of T — T and, 
hence, by 0{n~ l ) (see, e.g., [1]). That requires the following assumption: 

(5.3) Xj — Xj + i > const. • j^"^ 1 

(see also [15]). We conclude that the second term in equation (5.2) has the 
same asymptotic order as 

,2 



{1 + 0(1/ log log n)} • — ^2 w lK l + lo § 



a+l 



n 



under the above restriction with respect to the selection of the weights. 
Focusing on the first term in (5.2), we deduce by the Cauchy-Schwarz in- 
equality that 

2 



w kl 1 



A 



k.p 



K^,0>| 2 <{(^> 



w k - 1 



A 



k.p 



2 \ 1/2 

2 



\(<PkM 



+ const. • (^2E\(<f k -<p k ,6)\ 2 ^J 

We consider that 

^E\((p k - ip k ,8)\ 2 = ^E J2(<Pk-<Pk,<Pj)(<Pj,t 
k k j 

< const. • C e ^2r 2l3 E\((p k -ip k ,ipj}\ 2 

k,j 

= const. ■ ^2j~ 2l3 E\(<pj -ipj,^)] 2 
j 

+ const. • ^ j~ 2/3 ^2 E \ ^ k ~ PkiPj) 



1/2n 2 
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< const. 2/3 E\\cpj - ipjWl 

j 

+ const. • ^i" 2/? ^2E\(cp k , ifj - tfj) 

< const. • ' S ^j~ 2 ^ E\\(pj — ifj\ 2 



12 
3 

by exploiting the orthonormality of the <p~ and the ifj as well as Condi- 
tion T and Parseval's identity. Bhatia, Davis and Mcintosh [1] provide that 
the squared L2QO, l])-distance between ipj and tpj is bounded from above 
by the squared Hilbert-Schmidt norm of T — T multiplied by 8j 2a + 2 via 
condition (5.3). Thus we have 

(5.4) ^ i E\(<p k -(p k ,e)\ 2 = 0(n- 1 ), 

k 

where the constants contained in O(-) do not depend on 6 whenever 

(5.5) /3 > a + 3/2. 

Returning to the consideration of the first term in (5.2), we conclude that 
its asymptotic order reduces to that of 

>2 \(<Pk,0)\ 2 - 



Afc 1 
w k - 1 



k 

Then this term is bounded from above by 

\nfl a j log n\ 

^2\w k -l\ 2 \(<p k ,6)\ 2 + const.- Yl \(f k ,e)\ 2 X^ 2 E\X k -X k \ 2 

k k=l 

+ 0(n- 2l3p/a (logn) 2 P) 

< 0(n- 2 ^(logn)^) + const. • C©/n + ^|^ fc - 1| 2 |<^,#)| 2 

k 

by utilizing Condition T and again the results of [1]. The term 0{n~ 2 ^ p / a ) is 
asymptotically negligible [i.e., bounded by 0(l/n)] whenever p > a/(2a + 3) 
as we have already imposed the condition (5.5). It follows that the MISE 
of (5.1) may be reduced to its asymptotically efficient terms, that is, 

E \\§ -e\\ 2 = {i + (i)} (t, K - i\ 2 \(<p k J)\ 2 + - J>| K l ) 

(5-6) , * 

+ 0(n- 1 log a+1 n). 

The right-hand side of (5.6), however, corresponds to the MISE of an oracle 
estimator which uses the true versions of the eigenvalues and eigenfunctions 
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of r instead of the empirical ones. Also, it follows from [20] and [11] that 
the estimator 9 as in (5.1) attains the sharp asymptotic minimax risk when 
the weights are chosen as 

w k = (l-7/3 fc )+, 

when writing /3 k = (1 + k 2 ^) 1 ^ 2 with an appropriate deterministic parame- 
ter 7. More precisely, we consider 7„ which we define by the unique zero of 
the function $ = $1 - <£ 2 with 

00 

= AfcVfc(l - $ 2 (x) = C e xn/* 2 

fc=i 

for x > where <£>i and $2 are continuous montonically decreasing and 
increasing, respectively. The selection 7 = 7 n leads to asymptotic sharp op- 
timality (see, e.g., [11]). Clearly, we have 7 n x n -P/( 2 P+ a + 1 ) . Otherwise, not 
even the convergence rates are optimal as the required balance between the 
bias and the variance term is violated. By condition (5.5) our additional 
assumption saying that that = for k > n p / a / logn is verified under this 
optimal selection of the weights when stipulating that 7 > n~^^ 2 ^ +l ^ as we 
have assumed p > a /(2a + 3). 

Still, the suggested selector is an oracle choice as it requires knowledge of 
the true eigenvalues Xj. That motivates us to consider a data-driven selector 
7 of 7. First we split the sample (X, Y) into two independent data sets 
(Xj, Yj), j = 1,2. The first data set (Xi, Yi) consists of m pairs (X^, Y^) 
where mx n(l — 1/logn), and (X 2 , Y 2 ) contains all the other observations. 
We employ (Xi,Yi) to estimate the function 9 while the second data set 
(training data) is used to provide an selector of 7. Concretely, we fix 7 as 
the unique zero of = <&! — $ 2 where 

(5.7) = ^(A^r^l - x/3 k )+. 

k=l 

Therein, ' indicates that the estimator is based on the second data set. 
Then we define our selector of 7 as med{n~' 3// ^ 3 ' 3+1 ' l ,7,n _ ' 3 ^ 2 ' 3+1 ^}. This 
truncation takes into account the a priori knowledge about the true 7 n so 
that 1 7 — 7 n | < 1 7 — 7„| almost surely for n sufficiently large. 

Thus determining 7 does not require knowledge of Px ■ Now let us consider 
the MISE of the estimator 9^ where the index indicates the incorporated 
choice of the parameter 7. By (5.6), we derive that 

= (n~ 2 ^ +Q+1 )) 



+ {1 + °(1)} 
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2 00 

\(<p k ,e)\ 2 E\(l -7&)+ - II 2 + — X)AA^(1 -7/Sfc) 

fc=l 

where the terms contained in o(l) do not depend on 7. As the asymptotic 
order of m and n coincides, the estimator based on m data attains the same 
asymptotic rates and constants as the estimator which uses even n data, 
so our above calculations remain valid. Therefore, the estimator 9^ attains 
sharp minimax rates and constants whenever 

-Wk)+ - (1 - 7nA0 + | 2 K^>l 2 
1 00 

(5-8) X^E\(1 - j(3 k ) + - (1 - 7 rA)+| 2 

k=l 

uniformly with respect to 9 6 0. The first term in (5.8) is bounded from 
above by const. • E\j — 7 n | 2 . The second term has the upper bound 

rconst.-n 1 /(2/3+i)^ 

0(c^ +1 ) • E\nf - 7 n| 2 + 0(l/n) • £ X k lp ti < Pk 1 ] 

for some sequence (c n ) n tending to infinity sufficiently slowly. We deduce by 
Markov's inequality that (5.8) is satisfied if 

n a/(2 / 9+l) 7 -2, . E y. _ + E y. _ ^2 = ^-2/3/(2/3+0+1)) 

for some fixed integer v. The assertion (7 — 7 n | > s n , for some positive- valued 
sequence (s n ) n J, with s n /^ n — > 0, implies that 

|$l(7n + s n ) - $i(7n + s n )\ > C®s n n/o 2 

or 

|$i(7n - Sn) - *i(7r» - «n)| > C @ s n n/cr 2 + |<£>i(7n) - $i(7„ - s n )\. 
We have already imposed that p > a /(2a + 3) so that > n~ p for all k < 
{in — s n )~ 1 ^ . That, however, yields ||r' — r||ns — const. • s n n~ p ^~ l where 
II ■ ||hs denotes the Hilbert-Schmidt norm of an operator. Therein we have 
used the findings of [1] again and the monotonicity of the functions $1, $1, 
$2 as well as the definitions of 7„ and 7. We deduce by Markov's inequality 
that 

E\i- ln \ 2v = s^ + P[\^- ln \>s n ] 

= s 2 : + const. • n 2 ^s- 2 ^ 2 /E\\t' - T\\ 2 ^ 
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for any integer p. As all moments of H-X1H2 are finite by Condition T we 
derive that 

E\\t'-r\\% s = 0((n-m)-n, 

where we recall that T' is based on the training data set, thus on n — m x 
n/logn observations. As p< 1/2 we conclude by suitable choice of (s n ) n 
that 

E\l-ln\ 2U = 0([0 nln ] 2 n, 

where (o n ) n denotes some sequence tending to zero at an algebraic rate. 
Choosing v sufficiently large, we can finally verify (5.8) yielding the following 
proposition which summarizes the investigation carried out in this section. 

Proposition 5.1. We consider the FLR model in the setting of The- 
orem 4-1 except the condition that Px is known. In addition, suppose that 
(5.3), (5.5) and p € (a/(2a + 3), 1/2). Then, estimator (5.1) with the weight 
selector (5.7), which does not use Px in its definition, attains the sharp 
minimax rate and constant with respect to the mean integrated squared error; 
viewed uniformly over the function class O which is defined via Condition T. 

Hence, under some additional conditions on the model, we have estab- 
lished sharp minimaxity in the case where Px is unknown. Only an arbitrary 
number between a/ (2a + 3) and 1/2 is supposed to be known. 

6. Discussion and conclusions. We have proved equivalence of the FLR 
model and a white noise model involving an empirical covariance operator 
in Theorem 3.1. We mention that a and Px can be treated as real nuisance 
parameters in Section 3; more precisely, knowledge of those quantities is not 
needed to apply the data transformations. 

In contrast, for the asymptotic approximation in Section 4, Px must 
be known. Nevertheless, Section 5 shows that, with respect to the MISE, 
the sharp asymptotic minimax risk can be taken over to the case of un- 
known design distribution. Furthermore, under specific parametric assump- 
tion on Px, the condition of known Px can obviously be justified. Cai and 
Hall [5] explicitly mention Gaussian processes as examples for the random 
design functions Xj. For instance, assuming that Xj can be represented as 
Xj(t) = f £(s) dWj(s) with independent standard Wiener processes Wj as 
already suggested in the previous section, we realize that the function £ is 
precisely reconstrucable based on only one observation X\. Then as £ is 
known the distribution Px is known as well. Therefore, under this shape of 
Px, the assumption of known Px is not unrealistic at all. This phenomenon 
is typical for the functional data approach and does not occur in multivari- 
ate linear regression with finite-dimensional covariates. From that point of 
view, the assumption of known design distribution causes less trouble in 
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FLR compared to more standard regression problems. Still this does not 
address the completely nonparametric case for Px and 9. 

As an interesting restriction, we have assumed that (3 > (l + a)/2 in Con- 
dition T. Therefore, the quality of the approximation of the target curve 
9 in the orthonormal basis consisting of the eigenvalues of the covariance 
operator of the design variables must be sufficiently high. If this basis con- 
sisted of Fourier polynomials then that assumption could be interpreted as a 
smoothness condition on 9. That corresponds to the theorems in [19] and [2] 
where Holder conditions are imposed, which correspond to /5 > 1/2, in order 
to prove asymptotic equivalence of the white noise model on one hand and 
density estimation and nonparametric regression on the other hand. Oth- 
erwise, counterexamles can be constructed (see [3]). To our best knowledge 
our work represents the first proof of white noise equivalence in a statistical 
inverse problem. It seems reasonable that the essential condition is extended 
to P > (1 + a)/2 in this setting as the selection a = describes the setting of 
direct estimation (noninverse problems). Still, the question of whether our 
results are extendable to some /3 < (1 + a)/2 remains open. In Section 5 we 
have studied the case of unknown Px', however, the regularity parameter /? 
is still assumed to be known. Therefore, another interesting problem, which 
cannot be addressed within the framework of this paper, is whether this 
sharp risk can be achieved by an adaptive estimator, which does not use 
/3 and C® in its construction. Approaches to adaptivity in FLR are stud- 
ied in [4]; however, that report seems to focus on optimal rates rather than 
optimal constants. 

Also, combining Theorem 4.1 and the results of Brown and Low [2], we 
conclude that, under reasonable conditions, the FLR model is also equivalent 
to the standard nonparametric regression problem, under which the data 

Y j = [T 1 / 2 e}(x 3 ) + ae j , j = l,...,n, 

are observed where the £j are i.i.d. and A(0, l)-distributed, and the homo- 
geneous fixed design setting Xj = j/n, j = 1, . . . , n, is applied. 

Acknowledgments. The author is grateful to Markus Reiss for a discus- 
sion on this paper and to the reviewers for their inspiring comments. 
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